Intermediate spreadsheet structure

ABSTRACT

An improved intermediate spreadsheet structure for representing n-dimensional spreadsheets being interchanged among spreadsheet programs. The intermediate spreadsheet structure represents a spreadsheet as a set of nested segments. Each non-empty cell of the spreadsheet is represented by a cell segment. All of the cells belonging to a first-dimensional element of the spreadsheet such as a row are contained in a vector segment representing the row; All of the vector segments representing elements of a second-dimensional element such as a matrix are contained in a vector segment representing the second-dimensional element. The same type of nesting is used with all higher-dimensional elements. Each segment further contains descriptors which define certain aspects of the segment&#39;s content. The cell segments may further contain an expression control and descriptors belonging to the expression control which define an expression. The descriptors belonging to the expression control define the expression&#39;s operands and an operator. Operands may be constants, references to other cells of the spreadsheet, or another expression. Nesting of expressions is permitted to any practical depth. Other aspects of the spreadsheet specified by descriptors include the manner in which the spreadsheet and its contents are to be formatted when it is displayed, access control for portions of the spreadsheet, the data types of values, and rules for the order in which have the values of the cells in the spreadsheet are computed.

CROSS REFERENCES TO RELATED APPLICATIONS

This application is a continuation-in part of U.S. Ser. No. 679,675,filed 12/10/84, which issued on 6/14/88 as U.S. Pat. No. 4,751,740.

BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention relates to structures used to transfer formattedinformation between data processing systems, and more particularly tostructures used to transfer a spreadsheet from one spreadsheetprocessing system to another.

2. Description of the Prior Art

Spreadsheets may be created and manipulated using many different kindsof spreadsheet programs. Each program creates a form of spreadsheetwhich is specific to that program. Thus, if one person creates aspreadsheet and another who has a different spreadsheet program wishesto use the spreadsheet, the spreadsheet must be translated from thestructure (the source structure) required by the first spreadsheetprogram to the structure (the target structure) required by the secondspreadsheet program. Of course, programs can be written which performthe translation, but there must be a program for each sourcestructure--target structure pair. In order to simplify the translationprocess, spreadsheet program makers developed intermediate spreadsheetstructures which were specifically adapted to the exchange ofinformation between spreadsheet programs. With such structures, it wasonly necessary to provide programs which translated both to and from agiven spread sheet structure and the intermediate structure. An exampleof such an intermediate structure is the SYLK (Symbolic Link) fileformat developed for the Multiplan spreadsheet. The SYLK file format isdescribed in detail in Appendix C of the Wang PC Multiplan ReferenceGuide, 1st ed., Dec. 1982, Wang Laboratories, Inc., Lowell, MA, manualnumber 700-8016.

While the SYLK file format works for its intended purpose, the furtherdevelopment of spreadsheet programs has revealed certain limitations.For example, the SYLK file format can handle only two dimensionalspreadsheets, is limited to the expressions and the expression notationfound in the Multiplan spreadsheet, has a relatively small set of datatypes, and offers only limited control of spreadsheet formats. Moreover,the SYLK file format is not easily expanded to deal with newdevelopments in spreadsheet programs. It is an object of the presentinvention to provide an intermediate spreadsheet structure which canrepresent spreadsheets of any dimensionality, which can represent anyexpressions or formats defined for spreadsheets, and which is easilyexpandable to deal with new developments.

SUMMARY OF THE INVENTION

The intermediate spreadsheet structure of the present invention may beused to represent spreadsheets having elements with a maximumdimensionality of n. The intermediate spreadsheet structure comprises acell segment representing each non-empty cell in the spreadsheet, atleast one first dimensional vector segment which represents a firstdimensional element of the spreadsheet and which contains the cellsegments for any non empty cells belonging to that element, and for eachadditional dimension m where m is less than or equal to the maximumdimensionality n, a vector segment for each non empty element of thatdimension which represents the non-empty element and which containsvector segments for non-empty elements of the (m-1)th dimension of thespreadsheet.

It is thus an object of the invention to provide improved interchange ofspreadsheets among spreadsheet programs.

It is another object of the invention to provide an improvedintermediate spreadsheet structure.

It is an additional object of the invention to provide an intermediatespreadsheet structure which can represent a spreadsheet having elementswith a maximum dimensionality of n dimensions.

It is a further object of the invention to provide an intermediatespreadsheet structure with improved flexibility and expandibility.

Other objects and advantages of the invention will be understood bythose of ordinary skill in the art after referring to the detaileddescription of a preferred embodiment and the drawings. Particularattention is drawn to those portions of the Description beginning withSection 10 and to FIGS. 15-19.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram of prior-art translation of documentstructures.

FIG. 2 is a block diagram of translation of document structures in thepresent invention.

FIG. 3 is a block diagram of a document translation system duringtranslation from a source structure to an intermediate structure.

FIG. 4 is a block diagram of a document translation system duringtranslation from an intermediate structure to a target structure.

FIG. 5 is a block diagram of a document translation system in a network.

FIG. 6 is an overview of the intermediate document structure of thepresent invention.

FIG. 7 is a detail of a text segment in the intermediate documentstructure of the present invention.

FIG. 8 is a detail of voice and binary segments in the intermediatedocument structure of the present invention.

FIG. 9 is a detail of a named text shelf segment in the intermediatedocument structure of the present invention.

FIG. 10 is a block diagram of a document with prior art structure.

FIG. 11 is the document of FIG. 10 with the intermediate structure ofthe present invention.

FIG. 12 is a flow chart of a main translation loop for translatingdocuments having the structure of the document of FIG. 10 into theintermediate structure of the present invention.

FIG. 13 is a detailed flow chart of the character processing step in theflow chart of FIG. 12.

FIG. 14 is a detailed flow chart of the attribute processing step in theflow chart of FIG. 13.

FIG. 15 is a diagram of the display of a two-dimensional spreadsheet.

FIG. 16 is a diagram of the representation of a 2 dimensionalspreadsheet structure using the intermediate spreadsheet structure ofthe invention.

FIG. 17 is a diagram of cell address descriptors.

FIG. 18 is a detailed diagram of the contents of a cell.

FIG. 19 is simple example spreadsheet.

DESCRIPTION OF A PREFERRED EMBODIMENT

The following description of a preferred embodiment first describesimplementations of the invention in a single stand alone documentprocessing system and in a network of document processing systems.Thereupon, it describes a preferred embodiment of the intermediatedocument structure, and finally, it provides an example of translationbetween the preferred embodiment of the intermediate document structureand a prior-art document structure.

1. Stand-alone Translation System of the Present Invention: FIGS. 3 and4

A block diagram of a stand alone system for document translationaccording to the present invention is presented in FIG. 3. The documenttranslation system shown in that figure is implemented in a standardmulti user document processing system such as the Wang Laboratories,Inc. "ALLIANCE" (™) system. Such a document processing system commonlyincludes at least a mass storage device such as a disk drive for storingdocuments and document processing used by the processor to store dataand programs while processing a document. In FIG. 3, these componentsare represented as document and program Storage 303, processor 301, andprocessor local memory 313. Under control of a program, processor 301may fetch data and programs from document and program storage 303 tolocal memory 313, may execute the programs and process the data in localmemory as specified by the programs, and may store processed data instorage 303. Other components of the system, not important for thepresent discussion and therefore not shown in FIG. 3, may includeterminals for the users and means for reading and writing floppy disks.

Translation is necessary in a document processing system of the typeshown in FIG. 3 when a user of the system wishes to process a documenthaving a document structure different from that used in the documentprocessing system. Such a situation may arise when the user has a copyof the document on a floppy disk made by a different document processingsystem. In this case, the document must be read from the floppy intostorage 303 and then translated into the proper form before furtherprocessing is possible. Translation using an intermediate structuretakes place in two steps: from the first document structure to theintermediate structure and from the intermediate structure to the seconddocument structure. FIG. 3 shows the document processing system whileexecuting the first step. Storage 303 contains document with structure A305, document with intermediate structure I 307, and two programs: A-Iextraction program 309 and I-B composition program 311. Program 309 istermed an extraction program because it extracts information from adocument having structure A and produces a document containing the saminformation and having intermediate structure I. Program 311 is termed acomposition program because it composes a document having structure Bfrom the information contained in the document having structure I.

During the first step, processor local memory 313 contains four buffers,i.e., areas of memory in which data and programs relevant to thetranslation operation are stored during the translation operation. Abuffer 315 contains the portion of document 305 which is currently beingtranslated into the intermediate structure; I buffer 317 contains theresult of the translation of the contents of A buffer 315 into theintermediate structure; state buffer 319 contains data which indicatesthe current state of the translation operation; code buffer 321,finally, contains the code from program 309 which processor 301 iscurrently executing.

During translation from structure A to structure I, the system operatesas follows: for each portion of document A 305 being translated,processor 301 moves the components of document A s structure containingthe portion from storage 303 into A buffer 315. Processor 301 thenbegins translating the contents of A buffer 315 under control of codefrom program 309. If code other than what is presently in code buffer321 is required to perform the translation, that code is copied fromprogram 309 into code buffer 321. As processor 301 translates, it placesthe result in I buffer 317. When I buffer 317 is full, it is copied todocument I 307; similarly, when a portion of document 305 which is notpresently contained in A buffer 315 is required, the required portion ofdocument A 305 is copied from storage 303 to A buffer 315.

Variations on the above implementation of the invention will beimmediately apparent to one skilled in the art. For example, documentprocessing systems of the kind typified by the "ALLIANCE" generally haverelatively small memories 313; consequently, the buffers 315, 317, and321 will not be large and transfers between storage 303 and thesebuffers will frequently occur. When implemented in a system such as ageneral-purpose data processing system with large local memory, thebuffers may be large enough to accept an entire document and all of code309, and transfers between storage 303 and local memory 313 may occuronly at the beginning and end of the translation operation. Largesystems may also include means for permitting direct transfer of databetween storage 303 and memory 313 in such systems, data would betransferred between document 305 and document 307 and buffers 315 and317 and code from program 309 to buffer 321 without the directintervention of processor 301. Further, in a multiprogramming system,state buffer 319 may contain state permitting interruption andresumption of a processing operation.

The second step is analogous to the first. FIG. 4 shows the documentprocessing system during this step. The documents involved are thedocument with structure I 307 which resulted from the first step and adocument with structure B which is to be the result of the second step.The program involved is I-B composition program 311. The buffers are Ibuffer 317, state buffer 319, code buffer 321, and B buffer 403, whichcontains data destined for document 401. Code buffer 321 contains codefrom I-B composition program 311. During the translation operation,processor 301 under control of I B composition program 311 reads aportion of document 307 into I buffer 317, translates the contents of Ibuffer 317 into structure B, and places the result in B buffer 403. WhenB buffer 403 is full, its contents are written to document 401. Portionsof program 311 are copied to code buffer 321 as required to perform thetranslation operation.

If the document processing system must deal with documents havingstructures other than structure A, then there must be a programanalogous to A-I extraction program 309 for every structure which thedocument processing system must deal with. Of course, the number of suchprograms is reduced if all document processing systems adopt theconvention that documents on floppy disks are in the intermediatestructure. In that case, only two programs are required: I B compositionprogram 311 and a B-I extraction program for translating documentshaving the B structure into ones having the I structure.

2. Document Translation according to the Present Invention in a Network:FIG. 5

The situation in a networked system in which all documents which aretransferred via the network have the intermediate structure is similarto the one which arises when all documents on floppy disks have theintermediate structure. As shown in FIG. 5, each of the systems in thenetwork must have a composition program for translating documents fromthe intermediate structure into the structure used in the system and anextraction program for translating documents from the structure used inthe system to the intermediate structure.

Network 505 of FIG. 5 connects two systems, system 501 using structure Aand system 503 using structure B. Each system has storage 303, processor301, and memory 313. System 501 further has A-I extraction program 309and I A composition program 507, while system 503 has I B compositionprogram 311 and B I extraction program 509. FIG. 5 shows systems 501 and503 as they would be set up in the course of a transfer of a documentfrom system 501 to system 503. System 501 first operates under controlof A-I extraction program 309 to translate document with structure A 305into document with structure I 307 in the manner previously described.When the translation is finished, document with structure I 307 is sentvia network 507 from system 501's storage 303 to the equivalent storagein system 503. System 503 then operates under control of I-B compositionprogram 311 to translate document 307 into document with structure B401. In a transfer of a document from system 503 to system 501, thereverse of the above occurs. System 503, operating under control of B-Iextraction program 509, translates a document having structure B intoits equivalent having structure I. That document is then sent vianetwork 505 to system 501 which, operating under control of I-Acomposition program 507, translates the document with structure I intoone with structure A.

Since all of the documents transferred via network 505 have theintermediate structure I, a given system attached to the network needonly have an extraction program for translating the system's documentstructure into the intermediate structure and a composition program fortranslating the intermediate structure into the system's documentstructure. Thus, regardless of the number of kinds of documentstructures used by systems attached to the network, a given system needonly have two translation programs.

In the preceding discussion, it has been presumed that each step in thetranslation process translated an entire document. However, inembodiments of the invention in which the intermediate documentstructure is sequential, it is possible to translate from the firststructure to the intermediate structure to the second structure in acontinuous process in which the document having the intermediatestructure is translated into one having the second structure as fast asthe document having the intermediate structure is produced. In the standalone system of FIGS. 3 and 4, the two steps in the translation can becarried out by separate processes, one executing the extraction programand the other the composition program. In such a system there is no needfor a separate document with the intermediate structure; instead, as A-Iextraction program 309 executed by the first process outputs to I buffer317, I-B composition program 311 executed by the second process readsfrom buffer 317 and outputs to buffer 403. When that buffer is full,program 311 outputs to document with structure B 401.

In the networked system of FIG. 5, A-I extraction program 309 executingin system 501 may output from buffer 317 directly to network 505, and IB composition program 311 executing in system 501 may place datareceived over network 505 directly into buffer 317. Again, there is noneed for a document with the intermediate structure in storage 303 ofeither system 501 or system 503. Which of the possible implementationsis employed in a given system depends on the characteristics of thesystem. For example, in a system in which speed of transfer acrossnetwork 505 is not a limiting factor, or one in which the size ofstorage 303 is, the document with the intermediate structure may beoutput directly to network 505. If, on the other hand, the speed oftransfer is a limiting factor or the size of storage 303 is not, thedocument with the intermediate structure may be output to storage 303and from there to the network.

3. The Intermediate Document Structure in a Preferred Embodiment: FIG. 6

As previously indicated, the intermediate document structure in apreferred embodiment is sequential, i.e, the logical relationshipsbetween the components of the document are represented by the locationsrelative to each other of the components in the document structure. Theintermediate document structure of a preferred embodiment is furtherdistinguished by the fact that components of the document which aredependent from other components are nested within the components fromwhich they are dependent.. Both of these characteristics may be seen inFIG. 6, which shows parts of the intermediate document structure for asimple document. FIG. 6 represents a single sequence of data. Thus, thepoints indicated by A--A in the first and second lines of the figure arethe same. Wavy lines indicate that the document structure includesmaterial between the wavy lines which has been omitted.

The major component of the embodiment of the intermediate structureshown in FIG. 6 is the segment. The intermediate structure for adocument contains at a minimum a single segment. Components of thedocument may be represented by other segments, which are then nested inthe segment representing the entire document. A segment may containcomponents other than segments. These components include the data codes,generally character codes, which represent the document contents,attributes, which specify modifications to the appearance of the textrepresented by a sequence of character codes, control specifiers, whichindicate modifications which apply to a single point in the textrepresented by a sequence of character codes, and descriptors, whichimmediately follow the beginning of a segment, attribute, or controlspecifier and contain information concerning the segment, attribute, orcontrol specifier to which they belong.

In a preferred embodiment, the beginning of each segment is representedby a segment start code and a segment type code indicating the type ofthe segment, and the end of each segment is represented by a segment endcode and the segment type code for the segment. In FIG. 6, the segmentwhich contains all of the other components of the document has the`stream` type. The start of the segment is marked by start of segment(SOS) 605, which contains start segment code (SSC) 601 and segment typecode (STC) 603 indicating the `stream` type. The end of the streamsegment is marked by the end of segment (EOS) 641 in FIG. 6. EOS 641 forthe stream segment contains end segment code (ESC) 637 and a repetitionof STC 603 indicating the stream type.

The stream segment contains a descriptor and a segment of the `text`type. The descriptor contains administrative information about thedocument. Examples of such information include the name of the personwho created the document, the name of the person who typed the document,the document's title, a description of its contents, and the document'sclassification, for example letter or memo. The descriptor begins withstart of descriptor (SOD) 611 and ends with end of descriptor (EOD) 617.SOD 611 contains start descriptor code (SDC) 607 and descriptor typecode (DTC) 609 identifyinq the descriptor type, and EOD 617 contains enddescriptor code (EDC) 615 and a repetition of DTC 609. The area betweenSOD 611 and EOD 617 contains descriptor contents (DC) 613. In apreferred embodiment, all descriptors belonging to a segment mustimmediately follow that segment s SOS 605. Descriptors may not overlapand DC 613 may not contain a segment or another descriptor.

Segments of `text` type contain the sequence of character or numericcodes which makes up the document and may also contain controlspecifiers, attributes, descriptors and other segments. SOS 605 for thetext segment of FIG. 6 contains SSC 601 and STC 619 specifyinq the`text` type, and EOS 639 for the text segment contains ESC 637 and STC619 for the `text` type. The sequence of character or numeric codes inthe text segment is represented by text codes (TC) 621.

The text segment of FIG. 6 also contains an attribute and a controlspecifier. The attribute is a revision attribute which indicates that asequence of characters has been revised. The attribute begins with startof attribute (SOA) 627 and ends with end of attribute (EOA) 635. In apreferred embodiment, SOA 627 contains start attribute code (SAC) 623and an attribute type code (ATC), which indicates the type of theattribute. Here, ATC 625 indicates the `revision` attribute. EOA 635contains end attribute code (EAC) 633 and ATC 625. The attribute appliesto all of the characters represented by the character codes occurringbetween SOA 627 and EOA 635. The actual effect of the attribute dependson the document structure of the document which is finally produced fromthe intermediate document structure. For example, in some documents, abar may appear in the margin next to the text represented by thecharacter codes to which the attribute applies. In others, the attributemay have no meaning and will be ignored in the translation process. Aswill be explained in more detail later, attributes may overlap or benested within a segment, but may not extend across segment boundaries.All descriptors applying to an attribute immediately follow SOA 627 forthe attribute.

Control specifier (CTL) 630 in the text segment of FIG. 6 specifies apage break at the point in the sequence of character codes at which CTL630 occurs. CTL 630 consists of two parts: control code (CC) 629indicating a control specifier, and control type code (CTC) 631indicating the kind of control specifier. CTC 631 in FIG. 6 is for apage break. Other CTC codes may specify line breaks, tabs, indentations,and similar text formatting functions. A CTL 630 may be immediatelyfollowed by one or more descriptors further describing the formattingoperation specified by CTL 630.

In a present embodiment, SSC 601, ESC 637, SDC 607, EDC 615, SAC 623,EAC 633, and CC 629 are distinct arbitrary 8-bit codes; the type codesindicated by STC, DTC, ATC, and CTC are distinct arbitrary 16-bit codes.In other embodiments, the codes may have different lengths. Thecharacter codes may belong to a set of character codes such as theASCII, EBCDIC, or Wang Laboratories, Inc.'s WISCII character code set orcode sets such as those for Prestel terminals The numeric codes mayinclude codes used to represent fixed decimal values or floating pointvalues. Other types of segment may have other kinds of codesrepresenting the information they contain.

In a present embodiment of the text segment, confusion between the codesused to define segments, descriptors, attributes, and control specifiersand the codes used to represent data is avoided by means of a uniqueeight-bit identity code which specifies that the preceding eight bitsare not to be interpreted as one of the codes which marks the beginningor end of a segment, attribute, descriptor, or control specifier, butinstead as a data code. This technique is illustrated in FIG. 7, whereTC 621 in the third portion of the segment shown in the figure containsa character code identical with SSC 601. That character code is followedby identity code (IDC) 707, which prevents the code from beinginterpreted as the start of a segment. Variations of the technique justdescribed may be employed in other embodiments. For example, the orderof the code identifying the component and the code identifying thecomponent type may be reversed and the identity code may indicate that afollowing code is not to be interpreted as a type code.

An advantage of the intermediate document structure of the presentinvention is its adaptability. In a present embodiment, a document hasfive kinds of components: segments, descriptors, attributes, controlspecifiers, and data codes. However, segments, descriptors, attributes,and control specifiers are identified by means of 8-bit codes, andconsequently, new kinds of components may be added without changing thebasic nature of the document structure. The same is true with regard tonew types of segments, attributes, descriptors, and control specifiers.The types of these components are specified by 16 bit codes, and thus,it is possible to have up to 2**16 different types of segment and thesame number of types for the attributes, the descriptors, and thecontrol specifiers. Such adaptability of the intermediate structure isrequired to deal with the progress of document processing technology.For example, originally, documents were composed only of text; however,as the technology of document processing has expanded, documents havecome to include images and voice data, and the present inventionincludes segment types for voice data and images and for the the binarydata representing a voice signal or an image. As other items areincluded in documents, corresponding segment types may be added to theintermediate structure.

4. Segment Types in a Present Embodiment: FIGS. 7 and 8

In a present embodiment, there are 11 segment types:

1. stream: the stream segment type represents an entire document andcontains the segments representing the components of the document.

2. text: the text segment type represents the body of the text of thedocument.

3. header: the header segment type represents the page headers used in adocument.

4. footer: the footer segment type represents the page footers used in adocument.

5. note: the note segment type represents text which is a note to themakers of the document. Notes are printed only on request.

6. footnote: the footnote segment type represents the text of a footnotewhich refers to a point in the text corresponding to the location of thefootnote segment.

7. shelf: the shelf segment type represents data which has been storedfor later use in the document.

8. external reference: the external reference segment type representsinformation which is required for the document but not contained in thedocument. The contents of the external reference segment specify how theinformation referred to is to be located.

9. binary: the binary segment type contains information represented bybinary data codes instead of character codes. In a present embodiment,the binary segment type contains the data used to represent images andvoice signals.

10. image: the image segment type contains information required tointerpret the binary data in a binary segment representing an image.

11. voice: the voice segment type contains information required tointerpret the binary data in a binary segment representing voice data.

Of these types, the text, header, footer, note, footnote, and shelfsegments in a present embodiment all represent text sequences, andconsequently may contain TCs 621, attributes, and control specifiers.FIG. 7, showing a detailed representation of a text segment is exemplaryfor all of these segment types. The text segment of FIG. 7 representstext which begins with a title which is centered and underlined andwhich has been revised. The segment begins with SSC 601 and STC 619specifying a text segment, contains CC 629 and CTC 702 specifying thatthe following text is to be centered, SAC 623 and ATC 625 specifying thebeginning of a revised section of text, SAC 623 and ATC 703 specifyingthe beginning of a section of text which is underlined, attributedescriptor 711, specifying that the underline is to be a singleunderline and including SDC 607, DTC 709 indicating single underline,EDC 615, and DTC 709, TC 621 representing the sequence of characters inthe title, EAC 633 and ATC 703 marking the end of the portion to beunderlined, two occurrences of CC 629 and CTC 705 `return` marking theend of the title and a blank line following the title, TC 621 containingthe text following the title, EAC 633 and ATC 625 marking the end of theportion of the text which was revised, additional TC 621, and ESC 637and STC 619 specifying the end of the segment. As previously explained,IDC 707 and SSC 601 in the third line of the figure show how theidentity code is used to distinguish data codes from those whichindicate the start or end of a component of the document. FIG. 7 alsoshows how, as previously explained, attributes may overlap.

In a present embodiment, the text, header, footer, note, footnote, andshelf segment types all have the general form just presented; however,the header and footer segment types in a present embodiment may notcontain other segments. There is no such restriction for the text, note,footnote, and shelf types. For example, a text segment may include anote or footnote segment, and if the text includes a picture, an imagesegment and a binary segment representing the image.

A segment of the external reference type has as its contents theinformation required to locate the external reference. For example, ifthe external reference is to another document, the external referencesegment will contain the information which the document processingsystem requires to locate the other document.

In a present embodiment, a binary segment is always preceded by asegment specifying how the data contained in the binary segment is to beinterpreted. Presently, such interpretive segments are either voicesegments or image segments. Other embodiments may of course includeother kinds of interpretive segments. FIG. 8 presents a detailedrepresentation of one such combination of an interpretive segment with abinary segment. In that figure, the interpretive segment is a voicesegment. The voice segment begins with SSC 601 and STC 801 for the voicetype and ends with ESC 637 and STC 801 for the voice type. Its contentsare the information required to properly interpret the contents of thebinary segment. In a present embodiment, the contents of the voicesegment include audio data type (ADT) 803, which specifies the type ofaudio data contained in the binary segment, V 805, specifying theversion of that type, the digitization rate (DR) 807 for the audio data,and the length of time (T) 813 represented by the following binary data.

The binary segment begins with SSC 601 and STC 811 for the binary typeand ends with ESC 637 and STC 811 for the binary type The contents ofthe segment include L 813, specifying the length of the data in bytes,and BC 815, containing the binary data codes. The contents of L 813 andBC 815 are interpreted solely as binary data, and consequently, a binarysegment in a present embodiment cannot contain other segments,attributes, or control specifiers.

The relationship between the image segment and the binary segmentcontaining the image data is substantially the same as that between thevoice segment and the binary segment containing the voice data. In apresent embodiment, the information used to interpret the image dataincludes image type, horizontal and vertical size, horizontal andvertical resolution, the encoding scheme, the version of the encodingscheme, the encoding parameter, a code indicating the hardware which wasthe source of the image, the display format, and the display color. Inother embodiments, the binary segment may contain codes representingvideo images and the image data may include the information needed toproduce a video image from those codes.

5. Attribute Types in a Present Embodiment

A present embodiment of the invention has 11 attribute types:

1. underscore: the underscore attribute indicates that the sequence ofcharacters specified in the attribute is to be underscored.

2. script: the script attribute indicates that the specified sequence ofcharacters is a subscript or superscript.

3. bold: the bold attribute indicates that the specified sequence is tobe in bold face type.

4. optional: the optional attribute indicates that the specifiedsequence of characters is to be displayed or not as the user specifies.

5 no break: the no break attribute indicates that the specified sequenceof characters will not be broken when lines are formatted.

6. strike through: the strike through attribute indicates that thecharacters in the specified sequence will be overstruck by a specifiedcharacter.

7. table of contents: the table of contents attribute indicates that thecharacters in the specified sequence are to be included in the table ofcontents.

8. index: the index attribute indicates that the characters in thespecified sequence ar to be included in the document's index.

9. revision: the revision attribute indicates that the text representedby the specified sequence has been revised.

10. reverse video: the reverse video attribute indicates that thecharacters in the specified sequence are to be displayed in a mannerwhich is the reverse of that usually used.

11. italics: the italics attribute indicates that the characters in thespecified sequence are to be in italics.

Several of the above attributes may have several variants. For example,in a present embodiment, underscore may specify one or two lineunderscore and script may specify a superscript or a subscript. Aspointed out in the discussion of the text segment and shown in FIG. 7, agiven variant is specified by means of an attribute descriptor 711 inthe attribute

6. Control Specifier Types in a Present Embodiment

In a present embodiment, there are thirteen types of control specifiers.They are the following:

1. alignment: the text at the point of the control specifier is to bealigned on a character such as a decimal point, comma, or asterisk.

2. tab alignment: the text at the point of the tab alignment controlspecifier is to be aligned with the next tab stop.

3. indent alignment: the left margin at the point of the indentalignment specifier is temporarily reset a previously-specified amount.

4. center: the line following the control specifier is centered.

5. hard return: the hard return control specifier specifies a point atwhich the current line must end until the author of the documentspecifies otherwise.

6. soft return: the soft return control specifier specifies the point atwhich the current line ends as the document is currently formatted.

7. hard page: the hard page control specifier specifies the point atwhich the current page must end until the author of the documentspecifies otherwise.

8. soft page: the soft page control specifier specifies the point atwhich the current page ends as the document is currently formatted.

9. column: the column the point at which a column begins. Descriptorsfollowing the column control specifier specify the line spacing, linejustification, lines per inch, and pitch in the column.

10. set format: the set format control specifier specifies the point atwhich a new format for the text begins. Descriptors following the setformat specifier specify the new format. The descriptors may specifyline spacing, settings for alignment, tabs, and indentation, andsettings for centering, right justification, line justification, linesper inch, and pitch.

11. set character set: the set character set control specifier specifiesthe point in the text at which a new interpretation of the document'scharacter codes begins. The interpretation is specified by a descriptorfollowing the set character set control specifier.

12 merge: the merge control specifier indicates a point at which textcharacters from another document will be inserted into this document.

13. no merge: the no merge control specifier indicates a point at whichno merging will be permitted.

As is apparent from the above descriptions, where a control specifierhas a number of possible effects on the format of the document, theexact effects are specified by means of descriptors immediatelyfollowing the control specifier.

7. Using Descriptors to Name Document Components: FIG. 7

In some prior art document structures, document components may havecharacter-string names. The names may be used in various documentprocessing operations to refer to the components. In a presentembodiment of the intermediate document structure, a component's name isrepresented by a descriptor of the `name` type. FIG. 9 shows how adescriptor of the name type may be used to represent the name of a textshelf segment. The descriptor follows immediately after STC 901 for theshelf and consists of SDC 607 DTC 903 for the `name` type, a charactersequence 905 representing the name, EDC 615, and DTC `name` 903.

8. A Document with a Prior-art Structure and its Equivalent with theIntermediate Structure: FIGS. 10-11

The discussion next turns to a specific example of translation between agiven document structure and the intermediate structure. There are firstpresented a document having a document structure of the type presentlyused in word processing and an equivalent document having theintermediate structure of the present invention. Thereupon, the methodsby which the translations are accomplished are discussed.

FIG. 10 is an illustration of the document structure of the typepresently used. The structure is made up of equal-sized numbered blocksin a file. The blocks have three different kinds of contents:administrative information about the document, indexes by means of whichcomponents of the document may be located, and the actual text of thedocuments. The administrative blocks are at fixed locations in the file.Blocks of other types may be anywhere in the file. Thus, except for theadministrative blocks, there is no relationship between the location ofa block in the file and its function in the document. Blocks are locatedin the file by means of pointers specifying block numbers. The pointersmay be used to link blocks into chains and to form indexes by which theblocks may be located.

The document illustrated in FIG. 10 contains two pages of text and anamed text shelf. Each page has a header and footer, and a portion ofthe text on one of the pages is underscored. The pages of text arecontained in document body chain 1025. Document body chain 1025 consistsof text blocks 1002. Each text block 1002 in the chain is linked bymeans of a pointer to the preceding and following block in the chain.The double linking makes it possible to move easily from one part of thedocument body to another.

The text blocks in the chain have two major components: the text portion(T) and the attribute portion (A). T contains character codes for thetext of the document, codes representing tabs, indentations, pagebreaks, and the like, and special codes called attribute characters. Thelast character in T of each text block is a special etx character codeindicating the end of T. In FIG. 10, attribute characters appear as AC1033 and the etx character as etx 1031.

The A portion of a text block 1002 contains informational attributes andvisual attributes. Each informational attribute corresponds to anattribute character and contains references by means of which other textblocks 1002 containing the information required for the informationalattribute may be located. The information applies at the location in thetext specified by the attribute character corresponding to theinformational attribute. In FIG. 10, there are three format attributes(FA) 1035, each one specifying a format for text and corresponding to anAC 1033 in T of text block 1002 containing FA 1035. The visualattributes specify ranges of characters in the text to which amodification such as underlining or bold face type applies. In FIG. 10,there is one visual attribute, VA 1023, specifying which portion of thetext is underlined.

Document body chain 1025 contains two pages of text. In the documentstructure of FIG. 10, each page must have a FA 1035. The FA 1035specifies the page's format, any headers or footers for the page, andthe fact that the AC 1033 corresponding to the FA 1035 also specifiesthe location of the beginning of a new page. The format, header, andfooter are specified by means of references in FA 1035 to text blockchains containing the information required for the format, header, andfooter. Thus, FA 1035 in the first block (21) in page 1 1027 has threereferences, represented by FOR, HR, and FR. FOR refers to the text block(35) containing the page format, HR refers to the text block (12)containing the header, and FR refers to the text block (26) containingthe footer. The first text block in page 2 1029 has the sameinformational attribute as the first text block in page 1 1027. Inaddition, text block (15) of that page contains VA 1023, the visualattribute indicating the par of the text which is underscored.

The chains of text blocks containing the header, footer, and formatreferred to in FA 1035 are each made up of only 1 block in the presentexample document. Text block (26) contains footer 1017, text block (12)contains header 1019, and text block 35 contains format 1021. Header1019 and footer 1017 both have FAs 1035 containing the reference FORreferring to format 1021. Headers, footers, and text thus all share thesame format. The final component of the document of FIG. 10, text shelf1015, is made up of another chain of text blocks containing 2 blocks,(20) and (30).

The remaining parts of the document structure of FIG. 10 are fouradministrative blocks 1031 containing document info blocks 1001,document table (DT) 1003, and three index blocks 1033 including nameindex block (NIB) 1005, page index block (PIB) 1007, and reference indexblock (RIB) 1009 Document info blocks 1001 include administrativeinformation about the document such as the document's title, creator,subject, size, and so forth. DT 1003 contains pointers to the document'sindexes. P10 points to NIB 1005, P16 points to PIB 1007, and P40 pointsto RIB 1009. DT 1003 is always at a fixed location in the documentstructure, and consequently, any component of the document can belocated by using DT to find the proper index and then using the index tolocate the component.

The three index blocks correspond to three indexes: a name index bywhich a named component of the document may be located using thecomponent's name, a page index by which individual pages of the documentmay be located, and a reference index by which chains containinginformation referred to by references in informational attributes may belocated. In the document of FIG. 10, each of these indexes is containedin one index block: the name index in NIB 1005, the page index in PIB1007, and the reference index in RIB 1009. In larger documents, an indexmay contain more than one index block.

The name index is made up of name index entries (NIEs) 1006. Each nameindex entry contains a name and a pointer to the first text block of thechain containing the named component. Thus, NIE 1006 in NIB 1005contains P20 pointing to text block (20), the first text block in textshelf 1015. The page index in PIB 1007 is made up of page index entries(PIEs) 1008. Each PIE contains a page number and a pointer to the firsttext block for the page. The document of FIG. 10 has two pages, thefirst beginning on block (21) and the second beginning on block (9), andaccordingly, the PIE for page 1 contains P21 and that for page 2contains P9. The reference index in RIB 1009 is made u of referenceindex entries (RIEs) 1010. Each RIE contains a reference number(represented here by FOR, HR, and FR), and a pointer to the first blockof the chain containing the reference, here block (35) for FOR, block(12) for HR, and block (26) for FR.

The components of the document structure and those of the intermediatedocument structure correspond as follows:

    ______________________________________                                        Structure of FIG. 10                                                                            Intermediate Structure                                      ______________________________________                                        entire document   stream segment                                              document body chain                                                                             text segment                                                1025                                                                          text shelf 1015   text shelf segment                                          footer 1017       footer segment                                              header 1019       header segment                                              format 1021       set format control specifier                                tabs, page breaks,                                                                              control specifiers                                          etc.                                                                          VA 1023           attribute                                                   Doc info blocks 1001                                                                            descriptors                                                 ______________________________________                                    

The intermediate structure has no components corresponding to DT 1003 orthe index blocks, since the relationship of the components to each otherin the intermediate structure is determined by their positions relativeto each other in the intermediate structure.

FIG. 11 shows the translation of the document of FIG. 10 into anequivalent document with the intermediate structure. That documentbegins with SOS for the `stream` type 1101 and ends with EOS for thestream type 1151. Immediately following SOS 1101 are descriptors 110containing the information from document information blocks 1001 of theFIG. 10 document. Then comes SOS 1105 for the `text` segment for thecontents of document body chain 1025, followed by PB CTL 1107, a pagebreak control specifier marking the beginning of page 1, a set formatcontrol specifier 1109 and text format descriptors 1111 containinginformation as to how the text is to be formatted. The format describedin text format descriptors 1111 remains in effect until another SF CTL1109 occurs in the text segment. The information in descriptors 1111 isobtained from format 1021 of the FIG. 10 document. Following descriptors1111 is a header segment for the page 1 header. The segment includes SOS`header` 1113, SF CTL 1109 for the header format, header formatdescriptors 1115, header text 1117, and EOS `header` 1119. Header text1117 is obtained from header 1019, and header format descriptor fromformat 1021, as specified by FA 1035 in header 1019.

Next in the intermediate structure comes a footer segment for the page,containing SOS `footer` 1121, SF CTL 1109, footer format descriptor1123, footer text 1125, and EOS `footer` 1127. Like a format, once aheader or footer is established, it remains effective until a new one isestablished. Following the footer segment is page 1 text 1129. At theend of the text comes PB CTL 1107 for the page break at the end of thefirst page. Since page 2 has the same format, header, and footer as page1, there is no for format, header, or footer segments. Next is page 2text 1131, from page 2 1029. Page 2 1029 contains a visual attributeindicating an underscore, and consequently, included in page 2 text 1131is an underscore attribute, which contains SOA `underscore` 1133, anattribute descriptor 1135 indicating whether the underscore is single ordouble, the underscored portion of text 1131, and EOA `underscore` 1139.Thereupon come ununderscored text 1131 and EOS `text` 1141, marking theend of the text segment. The rest of the stream segment is occupied bythe text shelf segment corresponding to text shelf 1015. That segmentincludes SOS `shelf` 1143, a descriptor 1145 containing the shelf name(obtained from NIB 1005), the shelf content 1147, from the text blocksin text shelf 1015, and EOS `shelf` 1149'. Following the text shelfsegment and terminating the intermediate document structure is EOS`stream` 1151.

9. Translation Methods

As may be seen by a comparison of FIGS. 10 and 11, relationships whichare expressed by means of attributes, indexes, and pointers in thedocument structure of FIG. 10 are expressed by means of nested segments,attributes, and descriptors in the document structure of FIG. 11. Thus,in the document structure of FIG. 10, the fact that each page has anidentical header is expressed by the fact that the reference HR appearsin FA 1035 for each page, while the same fact is expressed in thedocument structure of FIG. 11 by placing a header segment in the textsegment ahead of the text for the first page to which it applies.

In programming terms, what happens is that when AC 1033 is encounteredin T of block (21), the processing of document body chain 1025 must beinterrupted, FA 1035 must be examined, and if it specifies a page break,new header, new footer, or new format, a PB CTL 1107, a header segment,a footer segment, or a SF CTL 1109 and its associated descriptors 1111must be placed in the intermediate structure. After that has been done,the processing of document body chain 1025 must be resumed. If, as isthe case here, the header or footer referred to in FA 1035 itself has inits text an AC 1033 and that AC 1033 refers to another FA 1035containing a reference (here the reference to format 1021, FOR), thenthe processing of the header or footer must be interrupted to processthe chain of blocks referred to by that reference. The nested componentsof the intermediate document structure thus correspond to a processingsequence in which the processing of a given component of the document ofFIG. 10 is begun, is interrupted when information from another componentis required, and is resumed when the processing of the other componentis complete.

In a present embodiment, the required processing sequence is achieved bymeans of a stack which is part of State Buf 319: when the processing ofa first component is interrupted, state including the kind of componentand the current location in the component is saved on the stack. Thenthe new component is located and processed. When the processing of thenew component is complete, the saved state is restored from the stackand processing of the first component continues. Generally speaking, inthe document structure of FIG. 10, an interruption or resumption ofprocessing of a component involves a shift from one chain of text blocksto another.

FIG. 12 shows the main translation loop of a preferred embodiment of atranslation program for translating the document structure of FIG. 10into the intermediate document structure. During operation of the loopin a system such as that shown in FIG. 3, the portions of the documentwhich are currently being translated are read from storage 303 into Abuf 315; as the intermediate document is produced, it is written to Ibuf 317, and from there to storage 303. The portions of the programcurrently being executed are contained in code buf 321, and state buf319 contains the stack, a position block indicating the location of thecharacter currently being processed, a value indicating the kind ofcomponent being processed, the character currently being processed, andother values necessary for the operation of the program.

The loop begins with initialization block 1201. Procedures in thatportion of the program output SOS `stream` 1101 and then read thecontents of doc info blocks 1001 and place descriptors 1103 containingthe information from those blocks immediately after SOS 1101.Initialization continues by using DT 1003 to locate the first text blockin document body chain 1025. Once the block is found, the programoutputs SOS `text` 1105 and begins to process the characters in T one ata time. Processing is done in the main translation loop.

On entering the main translation loop, two boolean variables, result andnot$exhausted, are set to True (block 1203). As may be seen fromdecision block 1205, the main translation loop will continue to operateuntil either result or not$exhausted is false. result is set to False ifany processing step in the main translation loop fails, andnot$exhausted is set to False when the entire document has beentranslated. The again translation loop thus terminates either as aresult of a failure in translation or upon completion of translation.

Translation then commences with the first character in T of the firsttext block in page 1 1027 and continues one character at a time (block1209). As shown by block 1211, if the character being processed is anycharacter other than etx 1031, it is processed by process char 1213. Aswill be explained in more detail later, if the character is a textcharacter, processing of the current chain continues; if it is an AC1033, state is saved and the next character processed by the main loopis the first byte from the corresponding informational attribute. If onof the bytes in the informational attribute is a reference to anothertext chain, the program saves state, outputs a code indicating the typeof the chain it is processing, outputs the characters necessary toindicate the start of the new component being processed, and processingcontinues with bytes from the text chain referred to in the reference.

If the character is etx 1031, the end of T in a text block in the chaincurrently being processed has been reached. The manner in whichprocessing continues is determined by whether the tex block is the lastin a page, the last in a chain, or the last in a document. If the textblock is not the last in a chain, it will contain a pointer to itssuccessor; if the text block is the last on a page, the first characterin the successor block will be an AC 1033 corresponding to a FA 1035specifying a page break. When the text block is neither the last in achain or the last on the page, processing continues with the firstcharacter of T in the successor block. (decision block 1215). When thetext block is the last on a page (decision block 1225), that characterwill be AC 1033 corresponding to FA 1035 specifying the page break, anda PB CTL 1107 will be output in the course of processing the AC 1033.The program determines whether the text block is the last in thedocument is determined by examining the stack. If it is empty, there areno other chains to be processed and no more characters in the presentchain. When the text block is the last in the chain, but not the last inthe document (decision block 1217), processing of the componentrepresented by the chain has been completed, and the program writes thecodes necessary to end the component to the intermediate document (block1218) and then restores the state saved when processing of the currentchain began (block 1219). That state contains the location of the nextcharacter to be processed, and processing continues as described. If thetext block is the last in the document, not$exhausted is set to F (block1221), which terminates the main translation loop. On termination, thecodes necessary to end the stream segment containing the document areoutput to the intermediate document.

Continuing with FIG. 13, which presents a detail of process char block1213, the program first determines whether the character being processedis part of a sequence of text (decision block 1300). If it is, itdetermines whether the character is an AC 1033 (block 1301). If it is,the program saves the current state (block 1303) and resets the positionblock to indicate the beginning of the informational attributeassociated with AC 1033 (block 1305). Thus, the next character fetchedin the main loop is the first byte of the associated attribute. If thecharacter is not an AC 1033, the program next determines whether it is acontrol character, i.e., whether it is a tab, indent, carriage return,or the like (block 1309). If it is, the program writes a controlspecifier corresponding to the control character to the document withthe intermediate structure (block 1315). If it is not, the programexamines the visual attributes associated with the character todetermine whether they have changed (block 1311). If they have, it doesthe processing required to begin or end an attribute in the intermediatedocument and then outputs the character to the intermediate document(block 1313). Thereupon, the next character is fetched.

If the character is not part of the text, it is part of an informationalattribute or some other non-textual entity such as a format. In thatcase, further processing depends on whether the character is a reference(block 1315). If it is, the current state is again saved and theposition block is set to the start of the chain referred to by thereference (blocks 1323 and 1325). Thus, the next character processed bythe main loop will be the first character of that chain. If thecharacter is not a reference and the item currently being processed isnot yet finished (decision block 1317), the character is processed asrequired for the item (block 1321). For example, if what is beingprocessed is an informational attribute specifying a page break, theprogram will output a PB CTL 1107. If the item is finished, the programwill restore the state saved when the processing of the item began(block 1319).

FIG. 14, finally, contains a detailed representation of the visualattribute processing performed in block 1311. In a present embodiment,the translation program receives attribute information about a characterfrom the document of FIG. 10 in the form of a bit array indicating whichattributes are on and which are off for that character. The translationprogram first compares the entire bit array associated with the currentcharacter with the entire bit array associated with the last characterreceived from the block. If there is no change, the program goesdirectly to block 1313 (block 1401). If there has been a change, theprogram compares the two bit arrays bit by bit. If a bit in the arrayfor the current character is the same as the corresponding bit in thearray for the previous character, the program simply compares the nextbits (block 1405); if they are not, the program determines from thecomparison of the corresponding bits whether the visual attributerepresented by the bits has been turned on or off (block 1409). In theformer case, the program writes the codes necessary to start theattribute to the intermediate document (block 1411); in the latter, theprogram writes the codes necessary to end the attribute (block 1413).

A concrete example of how the program works is provided by theprocessing of page 1 1027. During initialization, the program examinesDT 1003 to determine if there is a pointer to PIB 1007. If there is,there is text in the document, and the program outputs SOS `text` 1105.Using PIE 1008 to page 1 of the document in PIB 1007, the programlocates text block (21), the first block in page 1 1027, and beginsprocessing the first character in the block. That character is AC 1033corresponding to FA 1035, so the program saves state and beginsprocessing FA 1035. FA 1035 specifies a page break, and consequently, PBCTL 1107 is output to the document with the intermediate structure. FA1035 also specifies a new format, the one referred to by FOR.Consequently, process char 1213 again saves state, locates block (35)containing format 1021, sets the state to specify the first character inblock (35) and that the chain being processed is a format chain, andoutputs SF CTL 1109. The main translation loop then forms formatdescriptors as required by the text of block 35. When etx 1031 in block(35) is reached, the program responds as shown in FIG. 12 for an etx1031 which is the last in a chain. In this case, a control specifier isbeing processed, and thus, no special end codes are required.

The program then restores the state saved when processing format 1021began and resumes processing FA 1035. The next item is reference HR forheader 1019, so the program again saves the current state, outputs SOS`header 1113`, and begins processing T in header 1019. The firstcharacter in T of header 1019 is, however, AC 1033 referring to FA 1035in A of header 1019. This FA 1035 contains only the reference FOR toformat 1021. Process char 1213 therefore again saves the current state,outputs SF CTL 1109 following SOS `header` 1113, saves state again,produces header format descriptors 1115 from the text in format 1021,and restores state as previously described. Since there are no furtheritems in FA 1035, state is again restored and the remaining charactersin header 1019 are processed, to produce header text 1117. When etx 1031in header 1019 is reached, state is again restored and processing of FA1035 continues.

The next item in FA 1035 is FR, referring to footer 1017, which isprocessed in the fashion described for header 1019. When processing offooter 1017 is finished, processing of AC 1033 in block (21) is finishedand the remaining text characters in the block and the remaining blocksof page 1 are processed to produce page 1 text 1129. When AC 1033 ofblock (9), the first block in page 2, is reached, FA 1035 in that blockis processed. Since FA 1035 of block (9) specifies the same format,header, and footer as FA 1035 of block (21), there is no need to outputa new SF CTL, header segment, or footer segment, and all that is outputis PB CTL 1107 marking the end of page 1. Processing continues asdescribed above until all of the components of the document have beetranslated.

Translation from the intermediate structure to the document structure ofFIG. 10 employs the same general methods as translation in the otherdirection. First, the document structure is initialized by setting upthe administrative blocks and the first index blocks and loading docinfo blocks 1001 with the information from doc info block descriptors1103. Then the processing of the contained segments begins. Each segmentcorresponds to a different text chain in the document structure of FIG.10, and consequently, each time the beginning of a segment isencountered, processing of the current chain must be interrupted andprocessing of a new chain commenced. Each time the end of a segment isencountered, processing of the chain corresponding to the segmentcontaining the segment which ended must resume. Again, the program usesthe technique of saving state on a stack each time processing isinterrupted and restoring state each time processing of a segmentterminates.

While a document translated from a given document structure into theintermediate document structure and then back to the original documentstructure will contain the same information as the original document,the final document structure may not be completely identical with theoriginal document structure. For example, many of the text blocks ofFIG. 10 contain attributes referring to a single header block 1019. Inthe intermediate document structure, a header segment is produced eachtime the header changes. The program which translates from theintermediate document structure to the structure of FIG. 10 may notcheck whether a given header segment is identical to a header segmentwhich appeared previously in the document. If it does not perform such acheck, the program will translate each header segment it encounters intoa separate text block and the resulting document structure will containmore text blocks and RIEs 1010 than the original document structure.

AN IMPROVED INTERMEDIATE SPREADSHEET STRUCTURE 10. Introduction: FIG. 15

Further investigation of the intermediate document structure and thecomposition and extraction programs disclosed herein has shown that theintermediate document structure and the composition and extractionprograms may be modified to permit translation of one type ofspreadsheet to another type of spreadsheet.

A spreadsheet is a representation in the memory of a computer system ofthe tabular display produced by a spreadsheet program. An example ofsuch a tabular display is shown in FIG. 15. In the display, thespreadsheet appears as a matrix of cells 1503. Each cell 1503 isaddressable by its row and column number. A user may enter expressions(EXP 1511) into the cells 1503. When a cell contains an expression 1511,the value of the cell is the current value of the expression 1511. Theexpressions may include operands such as constants or the addresses ofother cells 1503 and operators indicating the operations to be performedon the operands. When an expression 1511 is entered into the display ofcell 1503, the spreadsheet program immediately computes the expression'svalue and displays the value in the cell 1503. If the expression 1511contains an operand which is the address of another cell, thespreadsheet program computes the value of the other cell and uses thatvalue to compute the value of the expression. Similarly, when a userchanges the value of a cell 1503 whose value is used to compute thevalues of other cells, the spreadsheet program immediately recomputesall of the other values. When a user is finished working on aspreadsheet, the spreadsheet program saves the representation of thespreadsheet in non volatile storage such as a disk drive.

As can be seen from the above description, spreadsheets resembledocuments in that they are interactively produced by the user and thensaved for later use. Spreadsheets further resemble documents in thatthere is a need to translate a spreadsheet produced by one spreadsheetprogram into a spreadsheet produced by another spreadsheet program. Inthe following, there is disclosed an intermediate spreadsheet structurewhich may be used with extraction and composition programs to translateone spreadsheet into another spreadsheet in a manner similar to that inwhich the intermediate document structure translates one documentstructure into another document structure.

11. The Spreadsheet model

Spreadsheets are usually 2 dimensional matrices of formulas. Such aspreadsheet may be seen as having elements with a maximum dimensionalityof 2. The rows are elements with a dimensionality of 1 and the entirematrix is an element with a dimensionality of 2. However, spreadsheetshaving elements with dimensions greater than 2 are conceivable:spreadsheets with elements three or more dimensions, spreadsheets withonly a single 1-dimensional element (a single row of cells), and so on.In fact, some presently available spreadsheets effectively have amaximum dimensionality of three. Such spreadsheets contain 2-dimensionalelements called grids and the spreadsheet may be made up of multiplegrids. To account for this, the spreadsheet model allows the definitionof cells to occur in any number of dimensions. A simple way to viewsomething n-dimensional is to view it one dimension at a time. At thelowest level of a Spreadsheet is Cell 1503 the placeholder for anexpression. Cells are 0-dimensional: they are points of data.

A set of Cells organized into a row make up a 1-dim array of cells--aVector 1505. The next dimension is made by lining up rows of cells oneafter another, forming a grid. The most consistant way to do that is tohave a Vector 1507 that contains vectors, each of which contains cells.Further dimensions are made by nesting vectors.

The intermediate spreadsheet structure is shown in FIG. 16: cell 1503 isrepresented by a cell segment 1617. The row to which the cell belongs isrepresented by a 1-dimensional vector segment 1619: the matrix to whichthe cell and the row belong is represented by a 2-dimensional matrixsegment 1627, and the entire spread sheet is represented by spreadsheetsegment 1629. A Spreadsheet segment 1629 always contains a Vectorsegment 1619. In a 1-dim spreadsheet this vector contains a bunch ofCell segments 1617. In a 2-dim spreadsheet, this vector segment containsa bunch of vector segments 1619, which in turn contain cell segments.

Spreadsheet segment 1629 is optional. When used, it implies that thedata being shipped is in fact a Spreadsheet; if the segment is not used,the data being shipped is merely data that fits nicely into thespreadsheet model, and can be used in any way desired.

In a preferred embodiment, the outermost vector segment of theintermediate spreadsheet structure (matrix segment 1627 in FIG. 16) musthave a vector descriptor or 1607 specifying the number of dimensions inthe spreadsheet. The interpretation of other descriptors depends onhaving this information. Nested vector segments MUST have a descriptorspecifying the vector's dimensions IF the dimensionality is not exactlyone less than their parent's dimensionality. Dimensionalities mustdecrease as nested vectors are entered, and no vector may have a 0 ornegative dimensionality. Dimensions are ordered by assigning eachdimension a number, and referring to the dimensions in decreasing order.The deeper a segment is nested, the more values are required for thecurrent address.

12. Cell Addressing

There are two ways to specify the address of a cell or a square-edgedgroup of cells, (cell group 1509 in FIG. 15) in any number ofdimensions. If the group to be addressed is within the set of cellsdefined by the most recently opened segment, then local addressing canbe used. If it is outside the most recently opened vector, globaladdressing must be used. There are separate descriptors for thedifferent addressing modes.

Both provide a way to specify the address of a single cell 1503 or cellgroup 1509 in any number of dimensions. These are used. e.q, to assign aname to a rectangular region of cells. They are unusual descriptorgroups because the order of descriptors within is meaningful.

Both follow the same basic pattern. For each dimension being expressed,a pair of descriptors is used (one of the descriptors is optional incertain cases). A group of cells is specified by identifying twoopposite corners of the group of cells, two n-tuples. For example, cellgroup 1509 is identified by r2c3 and r3c4. From this, two descriptorsare derived for each dimension, as shown in FIG. 17. The firstdescriptor 1703 is the initial value for the dimension and the seconddescriptor 1705 is this final value. Thus, for cell group 1509, thefirst description for the first dimension specifies 2 and the second 3,while the first descriptor for the second dimension specifies 3 and thesecond 4. Descriptors are ordered from the smallest to the greatestdimension.

While the {initial} descriptor is required, one for each dimension beingexpressed, the {final} descriptor is optional. If not given, the meaningimplied is as if it existed and contained the same value as theassociated {initial} descriptor. Thus, {initial 11} {initial 15} refersto the same cells as {initial 11}{final 11} {initial 15}{final 15}; theyboth refer to just one address, (11,15). In some applications, however,there is a distinction made between single cell referencing, andreferences to a group of cells in which the "group" happens to be asingle cell. Because of this, a reference to a single cell is presumedto be a "single cell reference" if no {final} descriptors every appearin it, that is, if it is given the smallest representation possible. Ifany {final} descriptors appear, it is assumed that the reference is to agroup of cells, even if the "group" contains exactly one cell. Thus, insome applications, {initial 11} {initial 15} may carry a subtlydifferent meaning than {initial 11} {initial 15}{final 15}, even thoughthe exact same cell and number of cells is referenced.

Global Cell Referencing

In global addressing, any cell in the spreadsheet can be referenceddirectly. An initial descriptor 1703 or initial and final descriptors1705 are used for each of the dimensions, starting with dimension 0 andincreasing. Global addressing could be used for any kind of reference;in practice local addressing is used where possible, because it is morecompact. Global addressing is most commonly used in cell segments 1617,because global addressing allows addressing relative to the currentposition, and in cell segments 1617 the current position is alwayscompletely known, so local addressing could not address any other cell!.

The initial and final descriptors 1703, 1705 each have an absolute formand a relative form. The absolute form gives the position of the cellrelative to origin 1513 of the spreadsheet. The relative form gives theposition relative to the current position. Of course, that dimension ofthe current position must be determinate to be used as a base forrelative addressing. Inside of Cell segments 1617, all n values of theaddress are known, so relative addressing could be used with any of theparts of an address.

Local Cell Referencing

Often, most all of the n-tuple making up an address are known: dimensiondescriptors in vector segments (except for the outermost one) specifythe higher dimensions of an address. At the level of a Cell segment, alln values are known. Outside of Cells, though, addresses tend to bepartially specified: the higher dimensions of an addresses are known(they are specified by enclosing Vector segments) but lower ones are notyet resolved. Local addressing treats the already defined, higherdimensional addresses as given and absolute, and just goes on to specifythe lower addresses. This means that an address specified in local modecan only be used to reference cells defined within the vector segmentthe reference itself occurs in, and also that relative addressing ismeaningless. Local addressing is very good for giving a group of cellswithin a vector some property, such as a collective name.

Things Common To Both Modes

It is meaningful--in either mode--not to fully resolve an address. Thatis, in a 3-dim vector, a group of cells might only give two dimensionsworth of limits. Unspecified dimensions are presumed to include allpossible addresses in the unspecified dimensions. Thus, when neither theinitial or final descriptor appear, negative infinity is used for theinitial address and positive infinity is used for the final address.

When it is necessary or desired to make explicit reference to an addressinfinitely far along a dimension, a special convention is used. Anydescriptor for an absolute address (either initial or final) whichcontains no actual data is assumed to reference the addresses as far aspossible from the origin in the appropriate direction. Note that wheninitial infinity is used, it means the SMALLEST possible address, whileimplicit or explicit final infinity refer to the GREATEST possibleaddress.

It is illegal to specify more dimensions than exist in the spreadsheetin any cell address.

13. Contents of Cell Segment 1617: FIG. 18

If a spreadsheet cell is empty, cell segment 1617 representing it willhave no contents. If the cell contains an expression, the contents of aspreadsheet cell segment 1617 will specify the expression, and if theexpression has a present value, the contents will specify the presentvalue and its data type. The manner in which these items are specifiedin a preferred embodiment is shown in FIG. 18. The first item is celldata type descriptor 1802, which specified the data type of the cell'spresent value. Cell data type descriptor 1802 consists of SOD 1801 andEOD 1805 for that type of descriptor and a type code (Data TC) 1805 forthe value's type. The next item is expression control 1807, a CTL 630which indicates that what follows is an expression 1831. Expression 1831is represented by means of operand descriptors 1821 for the operands andan operator descriptor 1822 for the operator to be applied to theoperands. In a preferred embodiment, postfix notation is used, i.e.,operator descriptor 1822 follows the descriptors 1821 for all of itsoperands. Each operand descriptor contains SOD 1809 and EOD 1817 for anoperand descriptor, a nested operand data type descriptor 1810,containing SOD 1811 and EOD 1815 for an operand data type descriptor andan operand data type code 1813 specifying the data type of the operand.Following the operand descriptor is the expression which defines thevalue of the operand (OP EXP) 1817. OP EXP 1817 may be a constant, theaddress of another cell, or a nested expression 1831. Expressions 1831may be nested to any depth. Following operand descriptor 1821 comeoperand descriptors 1821 for any other operands required for theoperation. Following all of the operand descriptors 1821 is operatordescriptor 1822, which contains SOD 1823 and EOD 1827 for an operatorand operator type code 1825. If the result of the expression was knownwhen the intermediate document structure was created, result 1829contains the value of the result.

As will be explained in more detail in the following, cell segmentcontents 1611 may include other descriptors which specify informationincluding the represented cell 1503's address in its row, the cell'sname, whether it is protected from modification, the data type requiredfor its values, and the format for display of the cell.

14. Detailed Description of the Intermediate Spreadsheet Structure

The following is a detailed description of a presently-preferredembodiment of the intermediate spreadsheet structure. The followingnotation is used in this description:

    ______________________________________                                        (         SOS 605;    )         EOS 639                                       {         SOD 611     }         EOD 617                                       !         CTL 630                                                             ______________________________________                                    

The character string immediately following the left brace, left bracket,or ! indicates the name of the segment, descriptor, or control. In thecase of descriptors, the value following the descriptor's name isdescriptor type code (DTC) 609 for the descriptor; next comes thedescriptor's content, expressed as a number and type of value. "*"indicates a variable number of values. For example,

    ______________________________________                                        {absolute initial                                                                              1        1:2 byte int}                                       ______________________________________                                    

defines a descriptor which specifies an absolute initial address in adimension. 1 is the DTC 609 for the descriptor and 1:2 byte intindicates that the initial address is indicated by means of a singletwo-byte integer value. The description of each construct includes allof the other constructs which may be included in that construct. Whichconstructs are in fact included of course depends on the spreadsheetbeing translated. The term "group" in a construct indicates a group ofdescriptors which contain information of a kind set forth in thedescription of the construct. For example, a "cell reference descriptorgroup" is a set of descriptors which specifies a cell or group thereof.

The description also refers to vector segments and cell segments as"siblings" and "children" and to vector segments as "parents". Thisterminology has the usual meaning if a vector segment immediatelycontains other vector segments or cell segments, that vector segment isthe parent of the immediately contained vector segments or cell segmentsand the immediately contained vector segments or cell segments aresiblings of each other and children of the parent vector segment.

15. Expression Control 1807

The expression control is used to express any arithmetic, and somenon-arithmetic, functions. In a preferred embodiment expression controlsalways represent functions that represent a single value; that is, noexpressions return matrices of values. In other embodiments expressionsmay return matrices. Expressions might refer to a matrix of data, butthey return a datum. Subexpressions can be nested within expressions;the method of representation is postfix. The term operator is used inthe general sense; there can be SlN or LOG operators. Operators can takebetween 0 and an infinite number of operands; most operators expect adefinite number, however. Specifying too few or two many results inundefined behaviour, which can include rejecting the expression aserroneous. Operands can be constants (of a variety of differentdatatypes), references to cells (or groups of cells), or expressions.

The descriptors belonging to the Expression control contain theseOperators and Operands. The order of operands is always significant. Theorder might not be significant arithmetically, as in 3+7 versus 7+3; butthe order of terms in the expression should be stored (if possible) inthe order they were entered by the user.

The expression control uses a postfix conventions, with the curiousadaptation that sub-expressions can be expressed by embedding anotherexpression control in an operand. This makes absolutely no difference tothe postfix expression; it just provides a way to express parenthesizedexpressions the way the user did. Of course, any expression can be laidout in "flat" postfix. But such nesting is useful when an operator takesa variable number of operands, such as Average(x, y, z, ...): in thesecases, arguments and operator are put in a sub-expression, and theoperator is assumed to consume all active operands. Note that "2 3+4Average" is the average of 5 and 4 (4.5) not 2, 3, and 4 (3).

There are other cases in which emitting !expression within {operand} isrecommended:

1) Whenever the operator is a function, especially one that might not beknown where the stream is going. This is a good idea because, if thefunction is not known, an intelligent composition program will toss thesubexpression but keep the rest intact. In any case, most functions takethe form of a parenthesized expression (see reason 3).

2) Whenever it is positively known that the expression is corrupted. IFthe corrupted part of the expression can be isolated in a subexpression,it can be better dealt with by the composition program.

3) Whenever parentheses were used when the expression was type in,(assuming it was typed in as infix). If the destination stores ordisplays the expression as infix, parentheses can then be reconstructedas they were entered. The expression will be correct whether this isdone or not, but it is best to preserve the user's expression as he typeit, when possible.

16. Address Descriptors 1701

    ______________________________________                                        Global Cell Reference:                                                        {absolute.sub.-- initial                                                                         1       1:2 byte int}                                      {absolute.sub.-- final                                                                           2       1:2 byte int}                                      {relative.sub.-- initial                                                                         3       1:2 byte int}                                      {relative.sub.-- final                                                                           4       1:2 byte int}                                      Local Cell Reference:                                                         {absolute.sub.-- initial                                                                         1       1:2 byte int}                                      {absolute.sub.-- final                                                                           2       1:2 byte int}                                      ______________________________________                                    

17. Datatype Descriptors 1802

Some segments have a "settable" datatype. In these cases, they have adefault datatype, and can have a descriptor which sets the datatype. Thedescriptor contains the actual type code. A spreadsheet may have adatatype that is unknown to the extraction program: the "N/A" (notavailable) number element. These become ERROR datums, with an error codeof 0.

18. Spreadsheet Segment 1629 and Spreadsheet Descriptors 1603

    ______________________________________                                        (Spreadsheet             no datatype                                          {grid flag      2        1:boolean}                                           Whether grid lines are used to delimit cells from                             neighbors when cells are displayed. Applies to all                            dimensions.                                                                   ______________________________________                                        {recalc count   3        1:4 byte integer}                                    The number of times to iterate on cyclic                                      references. The default is 0, implying that no                                recalc is done when cyclic references occur.                                  ______________________________________                                        {recalc expression                                                                            4        None}                                                Contains an !expression which evaluates to TRUE                               (nonzero) while recalculation should continue. If                             none is given, defaults to FALSE, meaning no                                  recalculation is performed.                                                   ______________________________________                                        {recalc dimension                                                                             5        *:2 byte enum}                                       Contains a list of priorities to obey while                                   recalculating cyclic references. Each integer names                           a dimension to "sweep through" when doing                                     recalculation:                                                                0         East/West                                                           1         North/South                                                         2         up/down (vertical) . . .                                            ______________________________________                                    

Thus, {recalc₋₋ dim 0 1} implies that recalculation is done by sweepingthrough rows, and within rows, downward through columns. Such sweepsoccur from the lowest cell address to the highest; the default is tofill in missing dimensionalities with any values missing, in increasingorder. However, if this descriptor does not appear, recalc is not doneby sweeping dimensions at all.

    ______________________________________                                        {last edit cell                                                                              6       local cell reference group}                            Describes the address of the last cell to be                                  modified.                                                                     ______________________________________                                        {border display                                                                              7       *:boolean}                                             A boolean per dimension, indicating whether a                                 border is displayed after the extreme cells along                             that dimension. The default is FALSE for each                                 missing dimension.                                                            ______________________________________                                        {rule precidence                                                                             8       *:2 byte int}                                          What to do when the various rules specified to                                operate on on dimensions collide. For example, the                            stream might specify one set of rules for column x                            and another for row y. Where they intersect, the                              sets of rules collide. This establishes the                                   ordering to apply to the sets of rules, in                                    decreasing order of priority.                                                 ______________________________________                                    

19. Vector Segments and Vector Descriptors 1607

    ______________________________________                                        (vector               no datatype                                             {dimensionality  1      1:2 byte int}                                         This descriptor must be the first descriptor in the                           outermost vector of the spreadsheet. The integer                              indicates the (nonnegative) number of dimensions to                           be expressed in this spreadsheet. Default is 0,                               which would be an empty spreadsheet.                                          {vector address  1      1:2 byte int}                                         The address of this vector, as viewed by its                                  parent. Meaningless on the outermost vector                                   segment. Default is the previous sibling's address                            plus one, or if there is no previous sibling, the                             value of the parent's first child address. This                               allows empty vectors to be skipped easily.                                    {first child address                                                                           2      1:2 byte int}                                         The smallest address in use among the children of                             this vector. Default is 0.                                                    {cell name       3      group + *:text}                                       The name of a group of cells enclosed within this                             vector. The cells are named by the cell reference                             descriptor group within. If none appears, all cells                           enclosed by this vector are named. If multiple                                groups of cells are given the same name, the names                            references them all - even if they are disjoint.                              {default cell protection                                                                       4      group + 1:bool}                                       The default protection of a group of cells enclosed                           within this vector; TRUE means protected. The cells                           are named by the cell reference descriptor group                              within. If none appears, all cells enclosed by this                           vector are affected.                                                          {cell violation action                                                                         5      group + 1:1 byte enum}                                The default action to take when a protected cell is                           entered:                                                                      -1       honor the protection; skip this cell                                          when navigating.                                                      0       honor the protection.                                                 1       ignore the protection, allow the cell                                         to be modified.                                                      The cells are named by the cell reference                                     descriptor group within. If none appears, all cells                           enclosed by this vector are affected.                                         {default cell format                                                                           6      none}                                                 This contains two descriptors, each holding groups:                           one to name a set of cells, and one to describe the                           formatting to be applied to them:                                             {cell reference  1      group}                                                {cell format     2      group}                                                {default display mult                                                                          7      group + 1:float.sub.-- 8}                             The default value to multiply numeric values by                               when displaying the value of a cell. This doesn't                             change the cell's value, just the display. The                                cells are named by the cell reference descriptor                              group within. If none appears, all cells enclosed                             by this vector are affected.                                                  {default cell type                                                                             8      group + 2:1 byte enum}                                The only data type legal in the named cells. If not                           given, the default is that the cell may contain                               instances of any datatype. The cells are named by                             the cell reference descriptor group within. If none                           appears, all cells enclosed by this vector are                                affected. Note that this does not actually declare                            a datatype for the purposes of parsing Cell                                   segments; in fact, a subsequent Cell segment under                            the influence of this descriptor, could contain a                             different datatype. This only affects what future                             data might be added to the Cells.                                             ______________________________________                                    

20. Descriptors and Controls for Cell Segments 1617

    ______________________________________                                        (Cell                                                                         {cell address  1     1:2 byte int}                                            The address of this cell, as viewed by its parent                             vector. If this descriptor is missing, default is                             the previous sibling's address plus one, or if                                there is no previous sibling, the value of the                                parent's first child address.                                                 {cell name     3     *:text}                                                  The name of this cell.                                                        {cell protection                                                                             4     1:bool}                                                  The protection applied to this cell; TRUE means                               protected.                                                                    {cell violation action                                                                       5     1:1 byte enum}                                           The default action to take when this cell (if                                 protected) is entered:                                                        -1       honor the protection; skip this cell                                          when navigating.                                                      0       honor the protection.                                                 1       ignore the protection, allow the cell                                         to be modified.                                                      {cell format   6     group}                                                   This contains a group of descriptors which which                              describe the display format for the cell. The cell                            format group is described below.                                              {display mult  7     1:float.sub.-- 8}                                        The default value to multiply numeric values by                               when displaying the value of this cell. This                                  doesn't change the cell's value, just the display.                            Default is no multiplier (1.0).                                               {cell type     8     2:1 byte enum}                                           The only data type legal in this cell. If not                                 given, the default is that the cell may contain                               instances of any datatype. Note that this does not                            actually declare a datatype for the purposes of                               parsing this segment; in fact, this cell could                                contain a different datatype. This only affects                               what future data might be added to the cell.                                  {datatype      2     2:1 byte enum}                                           The datatype of the cell's current value. The                                 dafault is float.sub.-- 8. Note that this descriptor is                       used to determine how to parse any data within the                            current Cell Segment.                                                         !Expression                                                                   {operand       1     1:float.sub.-- 8--settable}                              Operand descriptors may contain other descriptors                             including cell reference groups, a !Expression                                control, or a value. (operands containing multiple                            sources of values, such as both a cell reference                              and an expression control, are                                                assumed to be order-irrevelant: a composing process                           can build an expression with them in any order.)                              Descriptors which may be contained in {operand} are:                          {global cell reference                                                                       1     global cell reference}                                   {datatype      2     2:1 byte enum}                                           {operator      2     1:2 byte enum}                                           The operator to be applied to some preceeding                                 number of postfix stack atoms, see the !Expression                            control explanation above for details.                                        ______________________________________                                    

The Format Descriptor Group

This holds the definition of a cell's format, which includes almost allthe information required to display the cell's value. A cell can containinstructions for displaying a variety of kinds of data; it can offer oneformat for numbers and specify different directions in case it happensto contain a date, and so on.

    ______________________________________                                        Cell Display Format Descriptors                                               ______________________________________                                        {display.sub.-- data     1    2:1 byte ints}                                  The first integer indicates whether the expression                            contained by the cell is displayed, the second                                whether the expression's value is displayed. If                               neither is on, the cell will appear blank. The                                values used are:                                                                -1   sometimes displayed: depends on the                                          display software's view on what fits                                          and would look nice.                                                       0  never displayed                                                            1  always displayed.                                                       {display.sub.-- repeat    2    1:1 byte bool}                                 Indicates whether the cell's content is displayed                             repetitively until the cell's window is filled.                               {extend.sub.-- display    3    1:1 byte bool}                                 Indicates whether the cell's content is displayed                             extending to the right, beyond the cell's boundary,                           repeating as needed to cover blank cells, until a                             cell is reached with its own display (or a border                             is encountered). Note: if the cell is set to be                               displayed with "centered alignment", content is                               displayed extending downward, instead of to the                               right, until a cell with its own display (or a                                border) is reached.                                                           {RID           4    1:2 byte int}                                             This format's identifier.                                                     {name          5    *:text}                                                   This format's name.                                                           ______________________________________                                    

All the rest of the descriptors are used on a per-datatype basis, andare embedded in descriptors that represent that datatype. Eachdescriptor for a type may includes a group of descriptors defining howdata of the type is to be displayed. The descriptors permitted in thegroup follow the descriptor for the type.

    ______________________________________                                        Format Descriptors for Numeric Values                                         ______________________________________                                           {numeric format    6   group}                                              Format information for numeric display.                                        {decimal point string 1    *:text}                                           The characters to use as decimal point.                                        {thousands separator 2    *:text}                                            The string to use between digit-triples, indicating                           thousands. If not given, no characters are used to                            mark thousands.                                                                {decimal places    3    1:1 byte int}                                        The number of decimal places to display, right of                             the decimal point. At display time, values should                             be rounded to accomodate this number of digits. The                           value 0 × 80 (negative 1 byte infinity) implies that                    rounding is only done as needed, for instance to                              fit a cell boundary. The value 0 × 7f is used to                        imply that special steps should be taken to present                           the number with all possible precision, for                                   instance displaying the number as a fraction if                               possible.                                                                      {scientific       4    1:1 byte int}                                         Whether to use scientific format (nnE + mm) to                                express a number:                                                               -1   use scientific if it makes the display                                       easier to read.                                                            0  do not use scientific format.                                              1  always use scientific.                                                   {currency flag     5    1:2 byte integer}                                    This indicates whether the value represents                                   currency and should be displayed as such. In a                                preferred embodiment, 0 indicates that the number                             is not currency, and anything else indicates that                             it is. Specifically, -1 indicates that the currency                           type is unknown, and other values might be used to                            denote the particular currency type (US dollar,                               yen, etc.)                                                                     {currency string    6    *:text}                                             This indicates the string to prepend to the number                            to indicate that it is currency. If the currency                              string is given, it is ALWAYS applied, even if it                             conflicts with the content of the currency flag                               descriptor.                                                                      {percent flag    7    *:text}                                              Indicates that the number is to be displayed with                             the given string trailing; an indication that the                             value is a percentage (the string is generally                                "%"). This makes no assumptions about the value                               presented; the value .5 would be presented as .5%,                            not 50% (but see the multiplier descriptor).                                     {multiplier    8    1:float --8}                                           Indicates that the value should be multiplied by                              the given value before it is displayed. This does                             not change the cell's actual value; only the                                  display is altered. Useful in conjunction with                                {percent}.                                                                       {positive prefix string   9    *:text}                                     A string to prepend to positive numbers at display                            time. Defaults to nothing. This prepend occurs                                after modifications made by other descriptors, e.g                            {currency}.                                                                      {negative prefix string  10    *:text}                                     Just like {positive prefix string}, except that the                           default is "--" if the descriptor doesn't appear at                           all.                                                                             {positive suffix string   11    *:text}                                    A string to append to positive numbers at display                             time. Defaults to nothing. This occurs after                                  modifications made by other descriptors, e.g                                  {percent}.                                                                       {negative suffix string   12    *:text}                                    Just like {positive suffix string}.                                              {alignment     13    1:1 byte int}                                         How to align the number within the cell:                                        -1   Align in whatever way makes for the                                          best display                                                               0  No specific rule (use default or more                                         global setting).                                                           1  Left justify the number.                                                   2  Center the value within the cell.                                          3  Right justify the number.                                               ______________________________________                                        Format Descriptors for Dates and Times                                        ______________________________________                                           {dates and times      7    group}                                          Format information for the display of dates and                               times.                                                                           {ordering    1    *:1 byte int}                                            This gives the order of fields for dates and times.                           If a field is not mentioned it is not displayed.                                 0   year                                                                       1   month                                                                    2   day                                                                       3   day of week                                                               4   hour                                                                      5   minute                                                                    6   second                                                                    7   millisecond.                                                           ______________________________________                                    

Thus, {0 2 1 3 4 5} implies that the date and time are displayed as YearDay Month Hour Minute Second and that milliseconds and the day of theweek are not displayed. If the descriptor does not appear, the defaultis to display in whatever order seems best to the application; in theUS, a common order would be 3 1 2 0 4 5. If the descriptor does appearbut is empty, no date-time information can be displayed.

    ______________________________________                                        {year format      2     1:1 byte int}                                         This describes how the year is displayed:                                     -1         Display however the appearance is the                                         best.                                                              0          Display according to defaults or more                                         global rules                                                       1          Display in short form (last 2 digits                                          only)                                                              2          Display in short form if the date is                                          within 50 years.                                                   3          Display in long form always                                        4          Display as a text string: 1991 becomes                                        "one thousand nine hundred and ninety                                         one"                                                               {month format     3     1:1 byte int}                                         -1         Display however the appearance is the                                         best.                                                              0          Display according to defaults or more                                         global rules                                                       1          Display as digits (1).                                             2          Display as abbreviated text (Jan).                                 3          Display as long text (January).                                    {day of week format                                                                             4     1:1 byte int}                                         -1         Display however the appearance is the                                         best.                                                              0          Display according to defaults or more                                         global rules                                                       1          Display as digits (1). Monday is 1,                                           Sunday is 0.                                                       2          Display as abbreviated text (Mon).                                 3          Display as long text (Monday).                                     {day format       5     1:1 byte int}                                         -1         Display however the appearance is the                                         best.                                                              0          Display according to defaults or more                                         global rules                                                       1          Display as digits (23).                                            2          Display as digits with textual postfix                                        (23rd).                                                            3          Display as text ("twenty third").                                  {hour format      6     1:1 byte int}                                         -1         Display however the appearance is the                                         best.                                                              0          Display according to defaults or more                                         global rules                                                       1          Display as digits (12).                                            2          Display as text ("twelve").                                        {minute format    7     1:1 byte int}                                         -1         Display however the appearance is the                                         best.                                                              0          Display according to defaults or more                                         global rules                                                       1          Display as digits (12).                                            2          Display as text ("twelve").                                        {second format    8     1:1 byte int}                                         -1         Display however the appearance is the                                         best.                                                              0          Display according to defaults or more                                         global rules                                                       1          Display as digits (12).                                            2          Display as text ("twelve").                                        {millisecond format                                                                             9     1:1 byte int}                                         -1         Display however the appearance is the                                         best.                                                              0          Display according to defaults or more                                         global rules                                                       1          Display as digits (100).                                           2          Display as fractions of a second. (1/10)                           {padding string   10    1:text}                                               Characters are taken from this string as needed to                            pad numeric displays out to the normal width, as in                           1/23/91 to 01/23/91. Default is no padding                                    {field separator  11    *:text}                                               Repeated instances of this field indicate that                                characters occur before the first field, between                              the first and second field, between the second and                            third field, and so on. If nothing is specified,                              fields will be separated by a single space.                                   {alignment        12    1:1 byte int}                                         How to align the date within the cell:                                        -1         Align in whatever way makes for the                                           best display (e.g., use the                                                   spreadsheet's default rule for                                                displaying numbers.)                                               0          No specific rule (use default or more                                         global setting).                                                   1          Left justify the date.                                             2          Center the value within the cell.                                  3          Right justify the date.                                            Format Descriptors for Boolean Values                                         {boolean          8     group}                                                Format information for the display of Boolean                                 values.                                                                       {true string      1     *:text}                                               The string used to denote TRUE. Default is TRUE.                              {false string     2     *:text}                                               The string used to denote FALSE. Default is FALSE.                            {alignment        3     1:1 byte int}                                         How to align the Boolean within the cell:                                     -1         Align in whatever way makes for the                                           best display (e.g., use the                                                   spreadsheet's default rule for                                                displaying numbers.)                                               0          No specific rule (use default or more                                         global setting).                                                   1          Left justify the text.                                             2          Center the text within the cell.                                   3          Right justify the text.                                            Format Descriptors for Text                                                   {text             9     group}                                                {capitalization   1     1:1 byte int}                                         -1         Force upper case                                                   0          Leave case alone                                                   1          Force lower case.                                                             {alignment 2 1:1 byte int}                                         How to align the boolean within the cell:                                     -1         Align in whatever way makes for the                                           best display (e.g., use the                                                   spreadsheet's default rule for                                                displaying numbers.)                                               0          No specific rule (use default or more                                         global setting).                                                   1          Left justify the text.                                             2          Center the text within the cell.                                   3          Right justify the text.                                            ______________________________________                                    

Operators for !Expression

The operators used in a preferred embodiment. Note that an operator witha variable number of operands must be used in a subexpression (unless ithappens to be the last operator in the expression). The postfix a b - istaken to mean (a--b).

    ______________________________________                                        Operation                                                                              # of                                                                 Code     Operands  Operation Definition                                       ______________________________________                                        -1       variable  Unknown. Used for cases                                                       where the extraction program                                                  is unable to find a                                                           definition for the function,                                                  or in cases in which the                                                      expression is obviously                                                       damaged. A compostion                                                         program treats this as it                                                     treats any unrecognised                                                       operator code, by tossing                                                     part or all of the                                                            expression away.                                            0       1         Unary Plus (no operation,                                                     result is operand).                                         1       1         Unary subtract (negate)                                     2       2         Binary addition                                             3       2         Binary subtraction                                          4       2         Binary multiplication                                       5       2         Binary division. The result                                                   is not necessarily integral.                                6       2         raise to a power (a to the                                                    bth power)                                                  7       2         Remainder of division                                                         (modulus)                                                   8       1         Absolute value                                              9       1         Factorial.                                                 10       2         Ceiling. The value a is                                                       expanded to decimal, and a                                                    ceiling operation is done at                                                  decimal position b, with                                                      digits to the left of the                                                     decimal point being                                                           positive. The ceiling                                                         operation acts to increase                                                    the value of a or leave it                                                    unchanged. Ceiling( ).                                                        Examples:                                                                     1.39 -1 Ceil yields 1.4                                                       -3.229 -2 Ceil yields                                                         -3.22                                                                         113.4 2 Ceil yields 200                                                       -3.100 -1 Ceil yields                                                         -3.1                                                       11       2         Floor. The value a is                                                         expanded to decimal, and a                                                    floor operation is done at                                                    the decimal position                                                          specified by b, as in                                                         Ceiling. Floor decreases the                                                  value or leaves it unchanged.                                                 1.39 -1 Floor yields                                                          1.3                                                                           -3.229 -2 Floor yields                                                        -3.23                                                                         113.4 2 Floor yields                                                          100                                                                           -3.100 -1 Floor yields                                                        -3.1                                                       12       2         Truncate. The value a is                                                      expanded to decimal, and any                                                  digits right of the bth                                                       digit are discarded,                                                          counting digits as in floor                                                   and ceiling.                                                                  1.39 -1 Trunc yields                                                          1.3                                                                           -3.229 -1 Trunc yields                                                        -3.2                                                                          133.4 2 Trunc. yields                                                         100                                                                           -3.100 -1 Trunc yields                                                        -3.1                                                       13       2         Round. The value a is                                                         expanded to decimal, and the                                                  value is rounded at the bth                                                   digit.                                                                        1.39 -1 Round yields                                                          1.4                                                                           1.34 -1 Round yields                                                          1.3                                                                           -3.229 -2 Round yields                                                        -3.23                                                                         133.4 2 Round yields                                                          100                                                                           -3.100 -1 Round yields                                                        -3.1                                                                          The result at halfway points                                                  is indeterminate, as some                                                     machines will tend to round                                                   upwards always, and others                                                    might round up in some                                                        circumstances and round down                                                  in others.                                                 14                 reserved for Round Outward                                                    (if it is ever needed).                                    15       2         Random value between a and b                                                  inclusive, allowing                                                           non-integers, with equal                                                      probability. b must be                                                        greater than or equal to a.                                16       2         Inequality. TRUE if a <>b.                                 17       2         Equality. TRUE if a == b.                                  18       2         Less than. TRUE if a < b.                                  19       2         Greater than TRUE if a > b.                                20       2         Less than or Equal to. TRUE                                                   if a <= b.                                                 21       2         Greater than or equal to.                                                     TRUE if a >= b.                                            22       2         Logical Or.                                                23       2         Logical Exclusive Or.                                      24       2         Logical And                                                25       1         Logical Not                                                26       2         Logival Equivalence                                        27       2         Logical Implication                                        28       3         If. Given "a b c if", the                                                     value returned is b if a is                                                   TRUE (nonzero) and c                                                          otherwise.                                                 29       1         exponent, e to the ath power.                              30       1         log of a, base e.                                          31       2         log of a, base b.                                          32       1         square root.                                               33       2         bth root of a.                                             34       1         sign (-1, 0, 1)                                            35       1         radians to degrees                                         36       1         degrees to radians                                         37 . . . 61                                                                            1         sine, tangant, secant, *2 for co-,                                            *2 for arc-, *2 for hyperbolic: 24                                            functions.                                                 62       2         arctangent2                                                63       2         hyperbolic arctangent2                                     ______________________________________                                    

Other embodiments may have different sets of operators.

21. Intermediate Form for the Spreadsheet of FIG. 19

FIG. 19 shows a simple spreadsheet display consisting of a single rowwith three columns. The first cell of the row contains an expressionwhose operands are constants; the third cell contains an expression, oneof whose operands is the address of the first cell. The value of thefirst cell, 36, is used in computing the value of the third cell, 23.The intermediate spreadsheet structure representing the spreadsheet ofFIG. 19 is printed below, using the following notation. The comments areof course not part of the intermediate spreadsheet structure:

    ______________________________________                                        (x       SOS 605 for segment x                                                )        EOS 639, sometimes shown as )x for clarity                           {y       SOD 611 for descriptor y                                             }        EOD 617, sometimes shown as }y for clarity                           !        CTL 630                                                              a@b      integer value a, expressed in b bytes, a decimal                              value                                                                0a@b     integer value a, expressed in b bytes, a                                      hexadecimal value                                                    ;        comments                                                             ______________________________________                                        INTERMEDIATE SPREADSHEET STRUCTURE                                            (spreadsheet        ;start of                                                                     ;spreadsheet                                                (vector           ;outermost                                                                    ;vector begins                                             {dimensionality    ;this vector                                                2@2               ;represents dim 2                                          }                                                                             {first-child-address                                                                             ;the first inner                                            1@2               ; vector will have                                                            ; an address of 1                                          }                  ; (hence, row 1)                                           (vector            ;inner vector                                                                 ; begins. It has                                                              ; address 1                                                                   ; (because the                                                                ; first child                                                                 ; address of the                                                              ; parent says so)                                                             ; and a dim of 1                                                              ; (because the                                                                ; parent dim is 2,                                                            ; and this doesn't                                                            ; say otherwise.)                                           (cell             ;start of cell                                               {datatype                                                                      0102@2          ;cell contains a                                                              ; 2 byte integer                                             }                                                                                              ; address of cell                                                             ; is r1c0, since it                                                           ; didn't say                                                                  ; otherwise.                                                !expression       ;cell contains an                                                             ; expression:                                                {operand                                                                       {datatype                                                                      0102@2                                                                       }                                                                             5@2             ;2 byte int: 5                                               }operand                                                                      {operand                                                                       {datatype                                                                      0102@2                                                                       }                                                                             13@2            ;2 byte int: 13                                              }                                                                             {operator                                                                       2@2            ;plus                                                        }                                                                             {operand                                                                       {datatype                                                                      1@2                                                                          }                                                                             2@2             ;2 byte int: 2                                               }                                                                             {operator                                                                      4@2             ;multiply                                                    }                ;formula is 5 13                                                              ; + 2 *, or                                                                   ; (5 + 13) * 2.                                             36@2              ;cell result is 36                                         )cell                                                                         (cell              ;start of new cell                                           {cell.sub.- address                                                             2@2            ;address of this                                                              ; cell is r1c2                                               }                                                                             {datatype                                                                       0104@2         ;it contains a 4                                                              ; byte integer                                                                ; result                                                     }                                                                             23@4             ;content of cell                                                              ; is 23                                                      !expression      ;cell contains an                                                             ; expression                                                 {operand                                                                       {datatype                                                                      0001@2         ;this operand                                                                 ; contains no                                                                 ; constant,                                                   }               ;hence datatype                                                               ; Unknown.                                                    {cell.sub.-- reference                                                                        ;this operand is a                                                            ; cell reference                                               {absolute.sub.-- initial                                                       1@2           ;first address is                                                             ; 1, ie, row 1                                                 }                                                                             {absolute.sub.-- initial                                                       0@2           ;next is 0, so                                                                ; reference is to                                                             ; r1c0                                                         }                                                                            }cell.sub.-- reference                                                                        ;reference is to a                                                            ; single cell                                                }operand                                                                      {operand         ;next operand                                                 {datatype                                                                      0102@2         ;a 2 byte integer                                             }                                                                             13@2            ;value of operand                                                             ; is 13                                                      }                                                                             {operator                                                                       2@2            ;subtract                                                    }                                                                                              ;formula is r1c0                                                              ; 13 -, or r1c0-13                                         )cell              ;end of 2nd cell                                           )vector            ;finishing up                                              )vector                                                                      )spreadsheet                                                                  ______________________________________                                    

22. Conclusion

The foregoing Description of a Preferred Embodiment has disclosed anintermediate spreadsheet structure which employs the same principles asthe intermediate document structure to represent a spreadsheet beingexchanged among spreadsheet programs. As shown in the Description, theintermediate spreadsheet structure can represent spreadsheets having anynumber of dimensions, can describe cell addresses, can describe thevalues of cells and the formulas used to obtain them, and can describehow the spreadsheet and the contents of its cells are to be displayed.The use of descriptors and control codes within segments, the nesting ofcell segments in a first-dimension segments and the nesting of segmentsfor the (n-l)th dimension in segments for the nth dimension provide theease of processing, flexibility, and expandability characteristic of theintermediate document structure.

The preferred embodiment of the intermediate spreadsheet structuredisclosed herein is, however, only one possible embodiment thereof. Forexample, the basic structure of the intermediate spreadsheet may bemaintained while employing different conventions regarding the codeswhich begin and end segments and descriptors and specify controlspecifiers. Further, the intermediate spreadsheet structure of thepresent invention is inherently expandable, and consequently, newdescriptors or operators may be added. Thus the preferred embodimentdisclosed herein is to be considered in all respects illustrative andnot restrictive, the scope of the invention being indicated by theappended claims rather than the foregoing description, and all changeswhich come within the meaning and range of equivalency of the claims areintended to be embraced therein.

What is claimed is:
 1. A data processing system comprising:a) a memoryhaving instructions and a data processing representation of aspreadsheet structure stored therein; b) a processor responsive to theinstructions stored in said memory for converting the representation ofa spreadsheet structure from a source form into a destination form, saidrepresentation of the spreadsheet structure having elements of at leastone dimension, wherein the elements include a plurality of cells forholding information, said processor and instructions comprising:a firstconverter for converting the source form of the representation of thespreadsheet structure into an intermediate form, said intermediate formof the representation of the spreadsheet structure comprising: a cellsegment for representing each non-empty cell in the spreadsheetstructure; a first-dimensional vector segment for representing afirst-dimensional element of the spreadsheet structure and whichcontains cell segments for any non-empty cells belonging to thatelement; and for each additional dimension, m, of the spreadsheetstructure, a vector segment for non-empty element of that dimensionwhich represents the non-empty element and which contains vectorsegments for non-empty elements of the (m-1)th dimension of thespreadsheet structure; a second converter for converting theintermediate form of the representation of the spreadsheet structureinto a destination form of the representation of the spreadsheetstructure and for storing said destination form of the representation ofthe spreadsheet structure into said memory.
 2. A data processing systemas recited in claim 1 wherein certain of the segments in the dataprocessing representation include descriptors describing the contents ofthe segment.
 3. A data processing system as recited in claim 2 wherein adescriptor of the data processing representation of the spreadsheetstructure may contain another descriptor, and each descriptor includes acell reference descriptor for describing a location in the spreadsheetstructure of one or more of the cell segments, and the cell referencedescriptor contains one or more address descriptors for describingsegment addresses.
 4. A data processing system as recited in claim 3whereinthe address descriptors of the data processing representation ofthe spreadsheet structure include global address descriptors; andcertain of the cell reference descriptors contain global addressdescriptors ordered by dimension for a given cell segment and for everyvector segment which contains the given cell segment.
 5. A dataprocessing system as recited in claim 3 whereinthe address descriptorsof the data processing representation of the spreadsheet structureinclude local cell descriptors; and certain of the cell referencedescriptors contain local address descriptors specifying only theaddress of the cell segment in the immediately containing vectorsegment.
 6. A data processing system as recited in claim 2 whereinthedescriptors of the data processing representation of the spreadsheetstructure include a group descriptor for defining a group of cells; andthe group descriptor contains, for certain dimensions of the spreadsheetstructure, address descriptors describing vertices of the group in eachof the certain dimensions.
 7. A data processing system as recited inclaim 6 wherein the group descriptor is contained in other descriptorsincluding a name descriptor which defines the name of the groupdescribed in the contained group descriptor and a default descriptorwhich specifies default information for the cells within the group.
 8. Adata processing system as recited in claim 1 wherein certain of the cellsegments of the data processing representation of the spreadsheetstructure contain a representation of an expression, and therepresentation of the expression includes an expression control markingthe beginning of the representation and at least one operand descriptordescribing an operand in the expression.
 9. A data processing system asrecited in claim 8 wherein the representation of the expression mayfurther include an operator descriptor describing an operation to beperformed on the operand.
 10. A data processing system as recited inclaim 9 wherein an operand descriptor of the data processingrepresentation of the spreadsheet structure may further contain anotherrepresentation of an expression, whereby expressions may be nested toany depth.
 11. A data processing system as recited in claim 1 whereincertain of the cell segments contain a representation of an expressionand a current result from evaluation of the express.
 12. A dataprocessing system as recited in claim 11 wherein the cell segments thatcontain the current result from evaluation of the expression alsocontain an indicator of the date type of the current result.
 13. A dataprocessing system as recited in claim 1 wherein the intermediatespreadsheet structure is of a form non-specific to a program thatcreated the source spreadsheet structure.
 14. A data processing systemas recited in claim 1 wherein the processor is programmed to read thedata processing representation of the spreadsheet structure from thememory so that the data processing representation may be interchangedwith other data processing systems.
 15. A data processing system asrecited in claim 1 wherein each segment begins with a start indicatorindicating both a start and a type of the segment, and each segment endswith an end indicator indicating an end of the segment.
 16. In a dataprocessing system, a method of converting a representation of aspreadsheet structure from a source form into a destination form,comprising the steps of:a) converting the source form of therepresentation of the spreadsheet structure into an intermediate form,said intermediate form of a representation of the spreadsheet structurecomprising:a cell segment for representing each non-empty cell in thespreadsheet structure; a first-dimensional vector segment forrepresenting a first-dimensional element of the spreadsheet structureand which contains cell segments for any non-empty cells belonging tothat element; and for each additional dimension, m, of the spreadsheetstructure, a vector segment for each non-empty element of that dimensionwhich represents the non-empty element and which contains vectorsegments for non-empty elements of the (m-1)th dimension of thespreadsheet structure; and b) converting the intermediate form of therepresentation of the spreadsheet structure into a destination form ofthe representation of the spreadsheet structure.